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in vitro. The nucleotide integrase is prepared by introducing a DNA molecule which comprises a group II intron DNA sequence into a host 
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intron-encoded protein encoded by the introduced DNA molecule. Thereafter the nucleotide integrase is isolated from the cell. In another 
embodiment, the nucleotide integrase is prepared by combining in vitro an excised group II intron RNA, hereafter "exogenous RNA", with 
group II intron-encoded protein. In another embodiment, the nucleotide integrase is prepared by combining "exogenous RNA" with an 
RNA-protein complex which comprises a group II intron-encoded protein. 
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METHODS FOR PREPARING NUCLEOTIDE INTEGRASES 

BACKGROUND 

Nucleotide integrases are molecular complexes that are capable of 
5 cleaving double stranded DNA substrates at specific recognition sites and 
of concomitantly inserting nucleic acid molecules into the DNA substrate at 
the cleavage site. Thus, nucleotide integrases are useful tools, 
particularly for genome mapping and for genetic engineering. 

Structurally, nucleotide integrases are ribonucleoprotein (RNP) 

10 particles that comprise an excised, group II intron RNA and a group II 
intron- encoded protein, which is bound to the group II intron RNA. At 
present nucleotide integrases are made by two approaches . The first 
approach involves isolating the nucleotide integrase from source organisms; 
both the RNA and protein subunits of the nucleotide integrase are encoded 

15 by the DNA in such organisms. In order to obtain nucleotide integrases 
other than wild type, the source organisms are mutagenized. The 
mutagenesis is a laborious, multistep process which yields limited 
quantities of nucleotide integrase. 

The second approach used to prepare nucleotide integrases involves 

20 combining, In vitro, an exogenous, excised, group II intron RNA, with an 
RNA-protein complex in which the group II intron -encoded protein is 
associated with a splicing defective group II intron RNA rather than the 
excised, group II intron RNA. Therefore, the RNA-protein complex lacks 
nucleotide integrase activity. The exogenous RNA displaces the splicing 

25 defective group II intron RNA to form a nucleotide integrase. The RNA- 
protein complex is obtained by isolating RNA-protein complex from source 
organisms . In order to obtain the RNA-protein complex or to obtain a group 
II intron -encoded protein other than wild type, the source organism must be 
mutagenized. The mutagenisis is a laborious, multistep process which yields 

30 limited quantities of the RNA-protein complex. Thus, this method also 
provides limited quantities of the nucleotide integrase. 

Accordingly, it is desirable to have methods for preparing nucleotide 
integrase which are not laborious and which permit the nucleotide integrase 
to be readily modified from the wild type and which do not yield limited 

35 quantities of the nucleotide integrase. 

SUMMARY OF THE INVENTION 
The present invention provides new, improved, and easily manipulable 
methods for making nucleotide integrases. 
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In one embodiment, the nucleotide integrase is prepared by 
introducing a DNA molecule which comprises a group II intron DNA sequence 
into a host cell. The group II intron DNA sequence is then expressed in 
the host cell such that RNP particles having nucleotide integrase activity 
5 are formed in the cell . Such RNP particles comprise an excise introduced 
DNA molecule and a group II intron- encoded protein encoded by the 
introduced DNA molecule. Thereafter, the nucleotide integrase is isolated 
from the cell. 

In another embodiment, the nucleotide integrase is prepared by 

10 combining in vitro an excised, group II intron RNA, referred to hereinafter 
as "exogenous RNA" , with a group II intron- encoded protein. Preferably, 
the exogenous RNA is prepared by in vitro transcription of a DNA molecule 
which comprises the group II intron sequence. Preferably, the group II 
intron- encoded protein is made by introducing into a host cell a DNA 

15 molecule which comprises the open reading frame sequence of a group II 
intron, and then expressing the open reading frame sequence in the host 
cell such that the group II intron- encoded protein encoded by the open 
reading frame sequence is formed in the cell. Thereafter, the cell is 
fractionated and the protein is recovered. 

20 In another embodiment, the nucleotide integrase is prepared by 

combining in vitro an excised, group II intron RNA, referred to hereinafter 
as "exogenous RNA", with an RNA-protein complex which comprises a group II 
intron- encoded protein. Preferably, the exogenous RNA is prepared by in 
vitro transcription of a DNA molecule which comprises the group II intron 

25 sequence. Preferably, the RNA-protein complex is made by introducing into 
a host cell a DNA molecule comprising a group II intron DNA sequence which 
encodes a splicing- defective group II intron RNA. Thereafter, the cell is 
fractionated and the RNA-protein complex is isolated. 

The present invention also relates to a nucleotide integrase and an 

30 improved method for making RNA-protein complexes for use in preparing 
nucleotide integrases in vitro. 

BRIEF DESCRIPTION OF THE DRAWINGS 
Figure 1 is the plasmid map of plasmid pETLtrAl9. 
35 Figure 2 shows the nucleotide sequence of the 2.8 kb Hindlll fragment 

that is present in pETLtrA19 and that includes the Ll.HrB intron DNA 
sequence and portions of the nucleotide sequence of the flanking exons 
ItrlBEl and ltrBE2, SEQ. ID. NO. 1, the nucleotide sequence of the ItrA 
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open reading frame, SEQ. ID. NO. 2, and the amino acid sequence of the ItrA 
protein, SEQ. ID. NO. 3. 

Figure 3 is the plasmid map of plasmid pETLtrAl-1. 

Figure 4 is a schematic representation of the inserts in pLE12, 
pETLtrAl9 and pETLtrAl-1. 

Figure 5 is the sequence of the sense strand of the doublestranded 
DNA substrate, SEQ. ID. NO. 4, which was used to assess the nucleotide 
integrase activity of the nucleotide integrase which comprise an excised, 
Ll.ltrB intron RNA and an Itra protein. 

Figure 6a is a schematic depiction of the substrate which is cleaved 
by the nucleotide integrase comprising Ll.ltrB intron RNA and the Itra 
protein, and Figure 6b shows the IBS1 and IBS2 sequences of the substrate 
and the cleavage sites of the doublestranded DNA substrate which is cleaved 
by this integrase. 

DETAILED DESCRIPTION OF THE INVENTION 
Nucleotide Intearases 

Nucleotide integrases are enzymes that are capable of cleaving double 
stranded DNA substrates at specific recognition sites and of concomitantly 
inserting nucleic acid molecules into the DNA substrate at the cleavage 
site. The nucleotide integrases insert an RNA molecule into the sense 
strand of the cleaved DNA substrate and a cDNA molecule into the antisense 
strand of the cleaved DNA substrate. 

Nucleotide integrases are ribonucleoprotein (RNP) particles that 
comprise an excised, group II intron RNA and a group II intron- encoded 
protein, which is bound to the group II intron RNA. "Excised group II 
intron RNA," as used herein, refers to the RNA that is, or that is derived 
from, an in vitro or in vivo transcript of the group II intron DNA and that 
lacks flanking exon sequences. The excised, group II intron RNA typically 
has six domains and a characteristic secondary and tertiary structure, 
which is shown in Saldahana et al. , 1993, Federation of the American 
Society of Experimental Biology Journal, pl5-24, which is specifically 
incorporated herein by reference. The excised, group II intron RNA also 
includes at least one hybridizing region which is complementary to a 
recognition site on the substrate DNA. The hybridizing region has a 
nucleotide sequence, referred to hereinafter as the " EBS sequence", which 
is complementary to the sequence of the recognition site of the intended 
substrate DNA, referred to hereinafter as the "IBS sequence". The group II 
intron -encoded protein has an X domain, a reverse transcriptase domain, 
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and, preferably, a Zn domain. The X domain of the protein has a maturase 
activity. The Zn domain of the protein has Zn 2 * finger-like motifs. 

Group II intron RNA may be produced containing desired EBS sequences 
which hybridize to corresponding nucleotides on substrate DNA. In 
addition, group II intron RNA may be produced containing additional 
nucleotides in domain IV. In the methods of the present invention both of 
these group II RNA molecules are produced from an isolated DNA which is 
then introduced into a cell. Such isolated DNA typically is synthesized 
using a DNA synthesizer or is genetically- engineered, such as by in vitro 
site directed mutagenesis . 

A. Preparation of the Nucleotide Integrase by Isolation from a 
Genetically-Engineered Cell. 

In one embodiment, the nucleotide integrase is made by introducing an 
isolated DNA molecule which comprises a group II intron DNA sequence into a 
host cell. Suitable DNA molecules include, for example, viral vectors, 
plasmids, and linear DNA molecules. Following introduction of the DNA 
molecule into the host cell, the group II intron DNA sequence is expressed 
in the host cell such that excised RNA molecules encoded by the introduced 
group II intron DNA sequence and protein molecules encoded by is introduced 
group II intron DNA sequence are formed in the cell. The excised group II 
intron RNA and group II intron -encoded protein are combined within the host 
cell to produce the nucleotide integrase. 

Preferably the introduced DNA molecule also comprises a promoter, 
more preferably an inducible promoter, operably linked to the group II 
intron DNA sequence. Preferably, the DNA molecule further comprises a 
sequence which encodes a tag to facilitate isolation of the nucleotide 
integrase such as, for example, an affinity tag and/or an epitope tag. 
Preferably, the tag sequences are at the 5' or 3' end of the open reading 
frame sequence. Suitable tag sequences include, for example, sequences 
which encode a series of histidine residues, the Herpes simplex 
glycoprotein D, i.e., the HSV antigen, or glutathione S- transferase. 
Typically, the DNA molecule also comprises nucleotide sequences that encode 
a replication origin and a selectable marker. Optionally, the DNA molecule 
comprises sequences that encode molecules that modulate expression, such as 
for example T7 lysozyme. 

The DNA molecule comprising the group II intron sequence is 
introduced into the host cell by conventional methods, such as, by cloning 
the DNA molecule into a vector and by introducing the vector into the host 
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cell by conventional methods, such as electroporation or by CaCl 2 -mediated 
transformation procedures. The method used to introduce the DNA molecule 
is related to the particular host cell used. Suitable host cells are those 
which are capable of expressing the group II intron DNA sequence. Suitable 
5 host cells include, for example, heterologous or homologous bacterial 
cells, yeast cells, mammalian cells, and plant cells. In those instances 
where the host cell genome and the group II intron DNA sequence use 
different genetic codes, it is preferred that the group II intron DNA 
sequence be modified to comprise codons that correspond to the genetic code 

10 of the host cell. The group II intron DNA sequence, typically, is modified 
by using a DNA synthesizer or by in vitro site directed mutagenesis to 
prepare a group II intron DNA sequence with different codons. 
Alternatively, to resolve the differences in the genetic code of the intron 
and the host cell, DNA sequences that encode the TRNA molecules which 

15 correspond to the genetic code of the group II intron are introduced into 
the host cell . Optionally, DNA molecules which comprise sequences that 
encode factors that assist in RNA or protein folding, or that inhibit RNA 
or protein degradation are also introduced into the cell . 

The DNA sequences of the introduced DNA molecules are then expressed 

20 in the host cell to provide a transformed host cell. As used herein the 
term "transformed cell" means a host cell that has been genetically 
engineered to contain additional DNA, and is not limited to cells which are 
cancerous. Then the RNP particles having nucleotide integrase activity are 
isolated from the transformed host cells. 

25 Preferably, the nucleotide integrase is isolated by lysing the 

transformed cells, such as by mechanically and/or enzymatically disrupting 
the cell membranes of the transformed cell. Then the cell lysate is 
fractionated into an insoluble fraction and soluble fraction. Preferably, 
an RNP particle preparation is isolated from the soluble fraction. RNP 

30 particle preparations include the RNP particles having nucleotide integrase 
activity as well as ribosomes, mRNA and tRNA molecules. Suitable methods 
for isolating RNP particle preparations include, for example, 
centrifugation of the soluble fraction through a sucrose cushion. The RNP 
particles, preferably, are further purified from the RNP particle 

35 preparation or from the soluble fraction by, for example, separation on a 
sucrose gradient, or a gel filtration column, or by other types of 
chromatography. For example, in those instances where the protein 
component of the desired RNP particle has been engineered to include a tag 
such as a series of histidine residues, the RNP particle may be further 
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purified from the RNP particle preparation by affinity chromatography on a 
matrix which recognizes and binds to the tag. For example, NiNTA Superflow 
from Qiagen, Chatsworth CA, is suitable for isolating RNP particles in 
which the group II intron- encoded protein has a His 6 tag. 

5 B. Preparation of the Nucleotide In tear as e by Combining Exogenous RNA 
with a Group II Intron - Encoded Protein to Form a Reconstituted RNP Particle 

In another embodiment, the nucleotide integrase is formed by 
combining an isolated exogenous RNA with an isolated group II intron - 
encoded protein in vitro to provide a reconstituted RNP particle. 

10 Preferably the exogenous RNA is made by in vitro transcription of the group 
II intron DNA. Alternatively, the exogenous RNA is made by in vitro 
transcription of the group II intron DNA and the DNA of all, or portions, 
of the flanking exons to produce an unprocessed transcript which contains 
the group II intron RNA and the RNA encoded by the flanking exons or 

15 portions thereof. Then the exogenous RNA is spliced from the unprocessed 
transcript . 

The purified group II intron -encoded protein is prepared by 
introducing into a host cell an isolated DNA molecule. The introduced DNA 
molecule comprises the DNA sequence of the open reading frame (ORF) 

20 sequence of the group II intron operably linked to a promoter, preferably 
an inducible promoter. Alternatively, 3S the introduced DNA molecule 
comprises (1) the ORF sequence and (2) at least some portion of the DNA 
sequence of the group II intron which lies outside of the ORF sequence and 
(3) a promoter which is oriented in the DNA molecule to control expression 

25 of the ORF sequence. Preferably, the introduced DNA molecule also 
comprises a sequence at the 5' or 3' end of the group II intron ORF which, 
when expressed in the host cell, provides an affinity tag or epitope on the 
N-terminus or C-terminus of the group II intronencoded protein. Tagging 
the protein in this manner facilitates isolation of the expressed protein. 

30 Thus, the DNA molecule may comprise at the 5' or 3' end of the ORF, for 
example, a sequence which encode a series of histidine residues, or the HSV 
antigen, or glutathione- S- transferase. These DNA molecules may also 
comprise at the 5' or 3' end of the ORF a sequence that encodes thioredoxin 
or any other molecule which enhances distribution of the protein encoded by 

35 the ORF into the soluble fraction of the host cell. Typically, the DNA 
molecule also comprises nucleotide sequences that encode a replication 
origin and a selectable marker. 
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Conventional methods are used to introduce these DNA molecules into 
any host cell which is capable of expressing the group II intron ORF 
sequence. For example, the CaCl 2 -mediated transformation procedure as 
described by Sambrook et al . in "Molecular Coning A Laboratory Manual", 
5 pages 1-82, 1969, can be used to introduce the DNA molecules into E. coli 
cells. Suitable host cells include, for example, heterologous or 
homologous bacterial cells, yeast cells, mammalian cells, and plant cells. 
In those instances where the host cells either lack or have limiting 
amounts of the tRNA molecules for one or more of the codons which are 

10 present in the ORF, it is preferred that a DNA molecule encoding the rare 
tRNA molecules also be introduced into the host cell to increase the yield 
of the protein. Alternatively, the DNA sequence of the ORF is modified to 
match the preferred codon usage of the host cell. 

The ORF sequence is then expressed in the host, preferably by adding 

15 a molecule which induces expression, to provide a transformed host cell. 
Then the transformed cell is lysed, and preferably fractionated into a 
soluble fraction and an insoluble fraction. Then the group II intron - 
encoded protein is isolated, preferably, from the soluble fraction. 
Methods of isolating the protein from the soluble fraction include, for 

20 example, chromatographic methods such as gel filtration chromatography, ion 
exchange chromatography, and affinity chromatography, which is particularly 
useful for isolating tagged protein molecules. 

Following purification of the group II intron -encoded protein, the 
protein is incubated with the exogenous RNA, preferably in a buffer, to 

25 allow formation of the nucleotide integrase. Optionally, the protein and 
RNA are denatured prior to the incubation using guanidinium hydrochloride 
or urea. Then, during incubation, the denaturant is removed in the 
presence of cosolvents like salts and metal ions to allow proper folding of 
the protein and RNA in the nucleotide integrase. 

30 C_. Preparation of the Nucleotide Integrase by Combining Exogenous RNA 

with an RNA- Protein Complex. 

Alternatively, the nucleotide integrase is prepared by combining the 
exogenous RNA with an RNA-protein complex that has been isolated from an 
organism that has been genetically engineered to produce an RNA-protein 

35 complex in which the desired group II intron- encoded protein molecules are 
associated with RNA molecules that include a splicing defective, group II 
intron RNA but which lack the excised group II RNA. Preferably, the 
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exogenous RNA is prepared by in vitro transcription of a DNA molecule which 
comprises the group II intron sequence. 

Preferably, the RNA- protein complex is made by introducing into a 
host cell an isolated DNA molecule which comprises a group II intron 
5 sequence operably linked to a promoter, preferably an inducible promoter. 
The group II intron sequence encodes a splicing defective group II intron 
RNA. Typically, the DNA molecule also comprises nucleotide sequences that 
encode a replication origin and a selectable marker. Then the group II 
intron DNA sequence is expressed in the host cell. The group II intron 

10 encodes functional group II intron- encoded protein and a splicing-defective 
group II intron RNA. Thus, the RNA-protein complex made in this manner 
lack excised, group II RNA molecules that encode the group II intron- 
encoded protein. The RNA-protein complexes do, however, contain the 
functional group II intronencoded protein associated with RNA molecules 

15 that comprise the mutant, unspliced form of the group II intron RNA as well 
as other RNA molecules. 

The resulting RNA-protein complex is isolated from the host cell and 
then incubated with the exogenous RNA, preferably in a buffer, to form the 
nucleotide integrase. During the incubation the group II intron- encoded 

20 protein becomes disassociated from the RNA molecules which are present in 
the RNA-protein complex and combines with the exogenous RNA to form the 
nucleotide integrase. 

These methods enable production of increased quantities of nucleotide 
integrases. Conventional methods produce approximately 0.1 to 1 /*g of 

25 nucleotide integrase per liter of cultured cells. In the present 
invention, at least 3 to 10 mg of nucleotide integrase is produced per 
liter of cultured cells. These methods also offer the further advantage of 
permitting the sequences of the RNA component and the protein component of 
the nucleotide integrase to be readily modified. 

30 The following examples of methods for preparing a group II intron- 

encoded protein and for preparing nucleotide integrases are included for 
purposes of illustration and are not intended to limit the scope of the 
invention. 

Preparing Nucleotide Integrases In Vivo 
35 Example 1 

A nucleotide integrase comprising an excised RNA which is encoded by 
the Ll.ltrB intron of a lactococcal cojugative element prSOl of Lactococcus 
lactis and the protein encoded by the ORF of the Ll.ltrB intron were 
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prepared by transforming cells of the BLR (DE3 ) strain of the bacterium 
Escherichia coli, which has the recA genotype, with the .plasmid pETRLtrA19. 
Plasmid pETLtrA19, which is schematically depicted in Figure 1, comprises 
the DNA sequence for the group II intron Ll.ltrB from Lactococcus lactis, 
shown as a thick line, positioned between portions of the flanking exons 
ItrBEl and ltrBE2, shown as open boxes. pETLtrAl9 also comprises the DNA 
sequence for the T7 RNA polymerase promoter and the T7 transcription 
terminator. The sequences are oriented in the plasmid in such a manner 
that the ORF sequence, SEQ. ID. NO. 2, within the Ll.ltrB intron is under 
the control of the T7 RNA polymerase promoter. The ORF of the Ll.ltrB 
intron, shown as an arrow box, encodes the protein ltra. The sequence of 
the Ll.ltrB intron and the flanking exon sequences present in pETLtrAl9 are 
shown in Figure 2 and SEQ. ID. NO. 1. Vertical lines in Figure 2 denote 
the junctions between the intron and the flanking sequences. The amino 
acid sequence of the Itra protein, SEQ. ID. NO. 4 is shown under the ORF 
sequence, SEQ. ID. NO. 2, in Figure 2. The exon binding sites are encoded 
by sequences from and including nucleotides 4 57 go and including 463 (EBS1) 
from and including nucleotides 401 to and including nucleotides 4 06 
(EBS2a) , and from and including nucleotides 367 to and including 367-372 
(EBS2b) . Domain IV is encoded by nucleotide 705 to 2572. 

pETLtrAl 9 was prepared first by digesting pLEl2, which was obtained 
from Dr. Gary Dunny from the University of Minnesota, with Hindlll and 
isolating the restriction fragments on a 1% agarose gel. A 2.8 kb Hindlll 
fragment which contains the Ll.ltrB intron together with portions of the 
flanking exons ItrBEl and ltrBE2 was recovered from the agarose gel and the 
single- stranded overhangs were filled in with the Klenow fragment of DNA 
polymerase I obtained from Gibco BRL, Gaithersburg, MD. The resulting 
fragment was ligated into plasmid pET-lla that had been digested with Xbal 
and treated with Klenow fragment. pET-lla was obtained from Novagen, 
Madison, WI. 

pETLtrAl 9 was introduced into the E. coli cells using the 
conventional CaCl 2 -mediated transformation procedure of Sambrook et al . as 
described in "Molecular Coning A Laboratory Manual", pages 1-82, 1989. 
Single transformed colonies were selected on plates containing Luria- 
Bertani (LB) medium supplemented with ampicillin to select the plasmid and 
with tetracycline to select the BLR strain. One or more colonies were 
inoculated into 2 ml of LB medium supplemented with ampicillin and grown 
overnight at 37°C with shaking. 1 ml of this culture was inoculated into 
100 ml LB medium supplemented with ampicillin and grown at 37°C with 
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shaking at 200 rpm until OD 595 of the culture reached 0.4. Then 
isopropylbeta-D- thiogalactoside was added to the culture to a final 
concentration of 1 mM and incubation was continued for 3 hours. Then the 
entire culture was harvested by centrif ugation at 2,200 x g, 4°C, for 5 
5 minutes. The bacterial pellet was washed with 150 mM NaCl and finally 
resuspended in 1/20 volume of the original culture in 50 mM Tris, pH 7.5, 1 
mM EDTA, 1 mM DTT, and 10% (v/v) glycerol (Buffer A) . Bacteria were frozen 
at -70°C. 

To produce a lysate the bacteria were thawed and frozen at -70°C. 

10 three times. Then 4 volumes of 500 mM KC1, 50 mMCaCl2, 25 mM Tris, pH 7.5, 
and 5 mM DTT (HKCTD) were added to the lysate and the mixture was sonicated 
until no longer viscous, i.e. for 5 seconds or longer. The lysate was 
fractionated into a soluble fraction and insoluble fraction by 
centrif ugation at 14,000 x g, 4°C, for 15 minutes. Then 5 ml of the 

15 resulting supernatant, i.e., the soluble fraction, were loaded onto a 
sucrose cushion of 1.85 M sucrose in HKCTD and centrif uged for 17 hours at 
4°C, 50,0000 rpm in a Ti 50 rotor from Beckman. The pellet which contains 
the RNP particles was washed with 1 ml water and then dissolved in 2 5 fil 10 
mM Tris, pH 8.0, 1 mM DTT on ice. Insoluble material was removed by 

20 centrif ugation at 1,500 x g, 4°C, for 5 minutes. The yield of RNP 
particles prepared according to this method comprise the excised Ll.ltrB 
intron RNA and the ltra protein. 

Example 2 

A nucleotide integrase comprising the ltra protein and the excised 
25 Ll.ltrB intron RNA was prepared as described in example 1 except the 
plasmid pETLtrAl9 was used to transform cells of the BL2KD3) strain of E. 
coli . 

Example 3 

A nucleotide integrase was prepared by transforming cells of the E^ 
30 coli strains BLR (DE3 ) with pETLtrAl9 as described in Example 1 except that 
the transformed E. Coli were grown in Super- Broth (SOB) medium and shaken 
at 300 rpm during the 3 hour incubation. 

Example 4 

A nucleotide integrase was prepared by transforming cells of the E^. 
35 coli strain BL21 (DE3 ) with pETLtrAl9 as described above in Example 2 except 
the cells were also transformed with plasmid pOM62 which is based on the 
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plasmid pACYC184 and has an approximately 150 bp insert of the argU{dnaY) 
gene at the EcoRI site. The argU gene encodes the tRNA for the rare 
arginine codons AGA and AGG. The ItrA gene contains 17 of the rare 
arginine codons. The transformed cells were grown in SOB medium as 
5 described in Example 3 and fractionated into a soluble fraction and an 
insoluble fraction as described in Example 1. 

Preparing a Group II Intron- Encoded Protein Having a Purification Tag on 
the C Terminus . 
Example 5 

10 To facilitate purification of the protein, the ltra ORF was tagged at 

the C- terminus with a His 6 affinity tag and an epitope derived from the 
Herpes simplex virus glycoprotein D. The plasmid adding the tags was made 
in two steps by using PCR. In the first step, a fragment containing exon 1 
and the ltra ORF was amplified using primers LtrAexl.Xba having the 

15 sequence 5' TCACCTCATCTAGACATTTTCTCC 3', SEQ. ID. NO. 5 which introduces an 
Xba I site in exon 1 of ltrB, and ItrA expr3 5 ' CGTTCGTAAAGCTAGCCTTGTGTTTATG 
3', SEQ. ID. NO. 6, which substitutes a CGA (arginine) codon for the stop 
codon and introduces an Nhe I site at the 3' end of the LtrA ORF. The PCR 
product was cut with Xbal and Nhe I, and the restriction fragments gel 

20 purified and cloned into pET-27b{ + ), cut with Xba I and Nhe I obtained from 
Novagen, Madison, HI. The resulting plasmid plntermediate-C fuses the 3' 
end of the ltra ORF to an HSV tag and His 6 purification tag, both of which 
are present on the vector pET-27b( + ) . In a second step, intron sequences 
3' to the ORF and exon 2 were amplified using pLE12 as a substrate and the 

25 5 ' primer LtrAConZnl , having the sequence 5 ' CACAAGTGATCATTTACGAACG 3 ' , SEQ. 
ID. No. 7 and the 3' primer LtrAex2 , which has the sequence 
5 ' TTGGG ATCCTC ATAAG CTTT GCCGC 3 ' , SEQ . ID . NO . 8 . The PCR product was cut 
with Bell and BamHl, the resulting fragment filled in, gel-purified and 
cloned into plntermediate-C, which had been cleaved with Bpull021 and 

30 filled in. The resulting plasmid is designated pC-hisLtrAl9 . 

Cells of the BLR (DE3 ) strain of E. coli were transformed as described 
in example 1 with plntermediate-C and cultured at 37° C for 3 hours in SOB 
medium as described in example 3 . The cells were also fractionated into a 
soluble fraction, which contains RNP particles, and an insoluble fraction 

35 as described in example 1. 
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EXAMPLE 6 

To facilitate purification of the protein, the ltra ORF was tagged at 
the N- terminus with a His 6 affinity tag and the epitope tag XPRESS™ which 
was obtained from Invitrogen, San Diego, CA. The plasmid adding the tags 
5 was made in two steps by using PCR. In the first step, a fragment was made 
in two steps by using PCR mutagenesis. In the first step, the ItrA ORF and 
3' exon were amplified and BamHl sites were appended to both the 5' an 3' 
end of the ItrA ORF using pLE12 as a substrate and the following pair: 5' 
primer N-LtrA 5', having the sequence 5 ' CAAAGGATCCGATGAAACCAACAATGGCAA 3', 

10 SEQ. ID. NO. 9; and the 3' primer LtrAex2, SEQ. ID. NO. 8. The PCR product 
was cut with BamHl and the resulting restriction fragment was gel purified 
and cloned into the BamHl site of plasmid pRSETB obtained from Invitrogen, 
San Diego, CA. The resulting plasmid plntermediate-N fuses the N- terminus 
of the ItrA ORF to a His fi purification tag, and adds an XPRESS™ epitope tag 

15 from the vector. In a second step, the 5' exon and Ll.ltrB intron 
sequences 5' to the ORF were amplified using pLE12 as a substrate and the 
5' primer NdeLTRS, having the sequence 5 ' AGTGGCTTCCATATGCTTGGTCATCACCTCATC 
3', SEQ. ID. No. 10 and 3' primer NdeLTR3 ' , which has the sequence 5' 
GGTAGAAC CAT ATGAAATTC CTCCTC CCTAATCAATTTT 3', SEQ. ID. NO. 11. The PCR 

20 product was cut with Nde I, filled in, the fragment gel purified and cloned 
into plntermediate-N, which had also been cut with Nde I. Plasmids were 
screened for the orientation of the insert, and those oriented such that 
the 5' exon was proximal to the T7 promoter were used to transform the host 
cells. The resulting plasmid pFinal-N expresses a message under the 

25 control of the T7 polymerase promoter which comprises the El and E2 
portions of the exons ltrBEl and ltrBE2, and the ItrA ORF fused at the 5' 
end with an His 6 purification tag and the XPRESS™ epitope tag. 

Cells of the BLR (DE3 ) strain of E. coli were transformed as described 
in example 1 with plntermediate-N and cultured at 37°C for 3 hours in SOB 

3 0 medium as described in example 3. The cells were also fractionated into a 
soluble fraction, which contains RNP particles, and an insoluble fraction 
as described in example 1 . 

EXAMPLE 7 

Plasmid pETLtrAl - 1 was used to prepare a partially-purified 
35 preparation of the ltra protein, which is encoded by the ORF of the Ll.trB 
intron. Plasmid pETLtrAl-1 is a derivative of pETLtrAl9 and lacks exon 1 
and the intron sequences upstream of the ItrA ORF . Accordingly, the ItrA 
ORF is directly downstream of the phage T7 promoter following the Shine- 
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Dalgarno sequence in the plasmid. The plasmid map of pETLtrAl-1 is shown 
in Figure 3 . 

PETLtrAl-l was made by using the polymerase chain reaction to amplify 
the ItrA ORF using the 5' primer LtrAexpr 5' AAAACCTCCATATG AAACCAACAATG 
5 3', SEQ. ID. NO. 12 , which introduces an Ndel site and 3' primer LtrAex2, 
SEQ. ID. NO. 8. The PCR product was cut with Ndel and BamHI, gel purified 
on a 1% agarose gel, and cloned into pET20-lla. The inserts of pLE12, 
pETLtrAl9 and pETLtrAl-1 , each of which contain the ItrA ORF is depicted in 
Figure 4 . 

10 PETLtrA - 1 was introduced into cells of the E. coli strain BLR (DE3 ) as 

described in Example 1 and the transformed cells grown for 3 hours in SOB 
medium at 37°C as described in Example 3. Thereafter, the cells were lysed 
and the resulting lysate fractionated into a soluble fraction and insoluble 
fraction by low speed centrifugation as described in Example 1. 

15 Preparing a Nucleotide 
Example 8 

A nucleotide integrase is prepared in vitro by combining an exogenous 
RNA comprising an excised Ll.ltrB intron RNA with a purified LtrA protein. 
The purified LtrA is obtained by subjecting the partially-purified Itra 

20 protein of example 7 to standard chromatographic methods. The exogenous 
RNA is prepared by cloning the Ll.ltrB intron together with its flanking 
exons into a plasmid downstream of a T7 promoter, linearizing the plasmid 
downstream of the exon 2 using a restriction enzyme, and transcribing the 
intron with T7 RNA polymerase. The in vitro transcript is incubated for 

25 one hour at 3 7° C in 500 mM NH 4 C1 and 50 mM MgCl 2 , 10 mM DTT, 2 units RNase 
inhibitor, to increase or produce excised intron RNA. The exogenous RNA 
and purified ltra protein are then incubated in a buffer to form the 
nucleotide integrase. The nucleotide integrase is then isolated from the 
reaction mixture. 

30 Comparative Example A 

RNP particles were prepared as described in Example 1 from cells of 
the BLR {DE3 ) strain of E. coli that had been transformed with plasmid 
pETlla, which lacks a group II intron. Accordingly, these RNP particles do 
not comprise excised, group II RNA or group II intron -encoded proteins and 

35 therefore, do not have nucleotide integrase activity. 
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Comparative Example B 

RNP particles were prepared as described in Example 1 from cells of 
the BLR {DE3 ) strain of coli that had been transformed with plasmid 
pETLtrAl9FS, which comprises the sequence of an Itra ORF having a frame 
5 shift 372 base pairs downstream from the initiation codon of the ltra ORF. 
frame. Accordingly, these RNP particles contain a truncated ltra protein, 
i.e. an ltra protein lacking the Zn domain and, therefore, do not have 
nucleotide integrase activity. 

Characterization of the RNP particles of Examples 1 and 2 . 

10 A portion of the RNP particle preparation of examples 1 and 2 and 

comparative examples A and B were subjected to SDS gel electrophoresis. 
Staining of the resulting gel with Coomasie Blue permitted visualization of 
the proteins in each of the fractions. A band of approximately 70 kDa, 
which corresponds to the predicted molecular weight of the ltra protein was 

15 seen in the lanes containing aliquots of the RNP particles of Examples 1 
and 2 . This band was absent from the lanes containing the RNP particles 
prepared from comparative examples A and B. On the basis of the staining 
intensity of the 70 kDa band, the quantity of ltra protein in 10 OD 260 , 
units of RNP particles was estimated to be approximately 3 fig. These 

20 results indicate that RNP particles containing the group II intron- encoded 
protein ltra can be prepared by expression of the group II intron Ll.ltrB 
in a heterologous host cell. 

The reverse transcriptase activities of the RNP particles of examples 

1 and 2 and the RNP particles of comparative examples A and B were assayed 
2 5 by incubating each of the RNP particle preparations with a poly(ra) 

template and oligo (dTl8) as a primer. The RNP particles of examples 1 and 

2 exhibited reverse transcriptase activity, while the RNP particles of 
comparative examples A and B exhibited no reverse transcriptase activity. 
These results indicate that the methods described in examples 1 and 2 are 

30 useful for preparing RNP particles that have reverse transcriptase 
activity. The reverse transcriptase activity that is present in nucleotide 
integrases allows incorporation of a CDNA molecule into the cleavage site 
of the double stranded DNA which is cut by the nucleotide integrase. 

Characterizing the Distribution and Yield of the ltra Protein 
35 A portion of the insoluble fraction and soluble fraction of the 

lysates from the cells transformed and cultured according to the methods 
described in examples 1, 2, 3, and 4 were subjected to SDS polyacrylamide 
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gel electrophoresis. Following electrophoresis, the SDS gels were stained 
with Coomassie blue to compare the yield of the ltra protein and the 
distribution of the 70 kDa ltra protein prepared by the methods of examples 
1, 2, 3, and 4. The results of this assay demonstrated that more of the 
ltra protein was found in the soluble fraction when the transformed BLR 
(DE3) cells were grown in SOB medium and shaken at 300 rpm than when the 
transformed BLR cells were grown in LB medium and shaken at 200 rpm, These 
results also indicated that the total amount of ltra protein produced by 
the transformed BLR cells, that is the amount of LtrA in both the soluble 
and insoluble fractions, increased when a plasmid comprising the Ll.ltrB 
intron and a plasmid comprising argU(dnaY) gene were both introduced into 
the host cells. 

Characterization of the Group II Intron-Encoded Protein Prepared According- 
to the Methods of Examples 5 and 6 . 
15 A portion of the insoluble fraction and soluble fractions of the 

lysates from the cells transformed and cultured according to the methods 
described in examples 5 and 6 and in comparative examples A and B were 
subjected to electrophoresis on duplicate SDS -polyacryl amide gels, one of 
the gels was stained with Coomasie blueand the proteins on the duplicate 
20 were transferred to nitrocellulose paper by Western blotting. A primary 
antibody to the HSV antigen or the and an alkaline phosphatase- labeled 
anti-mouse IgG secondary antibody were used in an enzyme-linked immunoassay 
to identify proteins carrying the HSV epitope or the XPRESS™. The results 
of these assays showed that the anti-HSV antibody and the anti-XPRESS m 
25 antibody bound to a protein having a molecular weight of approximately 70 
kDa, which is the molecular weight of the ltra protein. The HSV tagged 
ltra protein and the xpress™ tagged ltra protein were found in the soluble 
and insoluble fractions from cells transformed with plntermediateC and 
blntermediateN but not in the soluble fractions and insoluble fractions of 
30 cells transformed with pet 27b ( + ) and PRSETB. Thus, the methods of 
examples 5 and 6 are useful for preparing- a tagged group II intron encoded 
protein. These assays also demonstrated that the amount of the tagged 
group II intron- encoded protein present in the soluble fraction, from which 
the RNP particles are derived, increases when the transformed and induced 
35 cells are incubated at 28°C as compared to 37°C. Alternative studies 
showed that incubation times of 3 0 minutes to 3 hours resulted in 
production of the tagged protein, but these incubation times resulted in 
production of less of the protein and are therefore less preferred. 
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Using the RNP Particles to Cleave Double- Stranded DNA and to Insert 
Nucleotide Sequences into the Cleavage Site . 

Nucleotide integrases are useful for cleaving one or both strands of 
a double -stranded DNA substrate, catalyzing the attachment of the excised, 
5 group II intron RNA molecule to one of the strands of the substrate DNA and 
catalyzing the formation of a CDNA molecule on the other strand of the 
cleaved double- stranded DNA substrate. Thus, the nucleotide integrases are 
useful analytical tools for determining the location of a defined sequence 
in a double -stranded DNA substrate. Moreover, the simultaneous insertion 

10 of the nucleic acid molecule into the first strand of DNA permits tagging 
of the cleavage site of the first strand with a radiolabeled molecule. In 
addition, the automatic attachment of an RNA molecule onto one strand of 
the DNA substrate permits identification of the cleavage site through 
hybridization studies that use a probe that is complementary to the 

15 attached RNA molecule. An attached RNA molecule that is tagged with a 
molecule such as biotin also enables the cleaved DNA to be affinity 
purified. Moreover, the cleavage of one or both strands of the double 
stranded DNA and the concomitant insertion of a nucleotide sequence into 
the cleavage site permits incorporation of new genetic information or a 

20 genetic marker into the cleavage site, as well as disruption of the cleaved 
gene. Thus, the nucleotide integrases are also useful for rendering the 
substrate DNA nonfunctional or for changing the characteristics of the RNA 
and protein encoded by the substrate DNA. while nucleotide integrases can 
be used to cleave doubles tranded DNA substrates at a wide range of 

25 temperatures, good results are obtained at a reaction temperature of from 
about 30° C to about 42° C, preferably from about 30° to about 37° C. A 
suitable reaction medium contains a monovalent cation such as Na* or and 
a divalent cation, preferably a magnesium or manganese ion, more preferably 
a magnesium ion, at a concentration that is less than 100 mM and greater 

30 than 1 mM. Preferably the divalent cation is at a concentration of about 5 
to about 20 mM. The preferred pH for the medium is from about 6.0-8.5, 
more preferably about 7.5-8.0. 

Cleavage of 3' and 5' end labeled double stranded DNA 

0.025 O.D. 2fi0 of the RNP particles of Example 1 and comparative 
35 examples A and B were incubated for 20 minutes with 150,000 cpm of each of 
a 5' and 3' end- labeled DNA substrate that comprises the exon 1 and exon 2 
junction of the ItrB gene. The sequence of the 129 base pair substrate, 
which comprises the 70 base pair exon 1 and exon 2 junction of the ItrB 
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gene, plus sequences of the plasmid is depicted in Figure 5 and SEQ. ID. 
NO. 4. To verify cleavage, the products were isolated on a 6% 
polyacrylamide gel. 

The substrate which is cleaved by the nucleotide integrase comprising 
5 the excised Ll.trB intron RNA and the Itra protein is schematically 
depicted in Figure 6(a). In addition, the IBS1 and IBS2 sequence of the 
substrate is shown in figure 6(b) . As shown in Figure 6, the IBS1 and IBS2 
sequences which are complementary to the EBS sequences of the Lltr.B intron 
RNA are present in exon 1 of the ltrb gene. As depicted in Figure 6, the 

10 RNP particles prepared according to the method of example 1 cleaved the 
sense strand of the substrate at position 0, which is the exon 1 and exon 2 
junction, and the antisense strand at +9. When the RNP particles of 
prepared according to the method .of example 1 were treated with either 
RNase A/Tl to degrade the RNA in the particles, or with proteinase K to 

15 degrade the protein component of the particles prior to incubation of the 
particles with the substrate, no cleavage of the substrate was observed. 
These results indicate that both the RNA component and the protein 
component of the nucleotide integrase are needed to cleave both strands of 
the substrate DNA. 

20 Cleaving Both Strands of Double -Stranded DNA and Inserting the Intron RNA 
of the Nucleotide Integrase into the Cleavage Site . 

0.025 O.D. 2fi0 units of the RNP particle preparation of example 1 were 
reacted with 125 fmoles (150,000 cpm) of the 129 base pair internally- 
labeled DNA substrate for 20 minutes. To verify cleavage, the products 

25 were glyoxalated and analyzed in a 1% agarose gel. 

A dark band of radiolabel of approximately 1.0 kb RNA and a lighter 
bands of approximately 0.8, 1.1, 1.4, 1.5, 1.6, 1.9, 2.5, 3.2 were observed 
on the gel. Pretreatment of the reaction products with RNase prior to 
isolation on the agarose gel resulted in the complete disappearance of 

30 these bands. These results indicate that Ll.trB intron RNA was attached to 
the DNA substrate during reaction of the substrate with the RNP particles 
of example 1. On the basis of the size of Ll.trB intron, it is believed 
that the band at 2.5 kb represents .the integration of the full length group 
II intron RNA into the cleavage site of the sense strand. The presence of 

35 smaller radiolabeled products on the gel is believed to be due to 
degradation of the integrated intron RNA by RNases which may be present in 
the RNP particle preparation. The finding that the RNADNA products 
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withstand denaturation with glyoxal indicates a covalent linkage between 
the intron RNA and the DNA substrate. 
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50 
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55 
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60 



WO 98/23756 



20 



PCTAJS97/21076 



TGACAATCTA ACTCCTGAAC AAATTCATGA AATAGGTCGT CAAACCATAT TAGAATTTAC 120 

AGGTGGCGAA TATGAATTTG TGATTGCAAC CCACGTCGAT CGTGAACACA TCCATAACGT 180 

5 

GCGCCCAGAT AGGGTGTTAA GTCAAGTAGT TTAAGGTACT ACTCTGTAAG ATAACACAGA 240 

AAACAGCCAA CCTAACCGAA AAGCGAAAGC TGATACGGGA ACAGAGCACG GTTGGAAAGC 300 

10 GATGAGTTAC CTAAAGACAA TCGGGTACGA CTGAGTCGCA ATGTTAATCA GATATAAGGT 360 

ATAAGTTGTG TTTACTGAAC GCAAGTTTCT AATTTCGGTT ATGTGTCGAT AGAGGAAAGT 420 

GTCTGAAACC TCTAGTACAA AGAAAGGTAA GTTATGGTTG TGGACTTATC TGTTATCACC 480 

15 

ACATTTGTAC AATCTGTAGG AGAACCTATG GGAACGAAAC GAAAGCGATG CCGAGAATCT 540 

GAATTTACCA AGACTTAACA CTAACTGGGG ATACCCTAAA CAAGAATG CC TAATAGAAAG 600 

20 GAGGAAAAAG GCTATAGCAC TAGAGCTTGA AAATCTTGCA AGGGTACGGA GTACTCGTAG 660 

TATTCTGAGA AGGGTAACGC CCTTTACATG GCAAAGGGGT ACAGTTATTG TGTACTAAAA 720 

TTAAAAATTG ATTAGGGAGG AAAACCTCAA AATGAAACCA ACAATGGCAA TTTTAGAAAG 780 

25 

AATCAGTAAA AATTCACAAG AAAATATAGA CGAAGTTTTT ACAAGACTTT ATCGTTATCT 840 

TTTACGTCCA GATATTTATT ACGTGGCGTA TCAAAATTTA TATTCCAATA AAGGAGCTTC 900 

30 CACAAAAGGA ATATTAGATG ATACAGCGGA TGGCTTTAGT GAAGAAAAAA TAAAAAAGAT 960 

TATTCAATCT TTAAAAGACG GAACTTACTA TCCTCAACCT GTACGAAGAA TGTATATTGC 1020 

AAAAAAGAAT TCTAAAAAGA TGAGACCTTT AGGAATTCCA ACTTTCACAG ATAAATTGAT 1080 

35 

CCAAGAAGCT GTGAGAATAA TTCTTGAATC TATCTATGAA C CGGTATTCG AAGATGTGTC 1140 

TCACGGTTTT AGACCTCAAC GAAGCTGTCA CACAGCTTTG AAAACAATCA AAAGAGAGTT 1200 

40 TGGCGGCGCA AGATGGTTTG TGGAGGGAGA TATAAAAGGC TGCTTCGATA ATATAGACCA 1260 

CGTTACACTC ATTGGACTCA TCAATCTTAA AATCAAAGAT ATGAAAATGA GCCAATTGAT 1320 

TTATAAATTT CTAAAAGCAG GTTATCTGGA AAACTGGCAG TATCACAAAA CTTACAGCGG 1380 

45 

AACACCTCAA GGTGGAATTC TATCTCCTCT TTTGGCCAAC ATCTATCTTC ATGAATTGGA 1440 

TAAGTTTGTT TTACAACTCA AAATGAAGTT TGACCGAGAA AGTCCAGAAA GAATAACACC 1500 

50 TGAATATCGG GAACTTCACA ATGAGATAAA AAGAATTTCT CACCGTCTCA AGAAGTTGGA 1560 

GGGTGAAGAA AAAGCTAAAG TTCTTTTAGA ATATCAAGAA AAACGTAAAA GATTACCCAC 1620 

ACTCCCCTGT ACCTCACAGA CAAATAAAGT ATTGAAATAC GTCCGGTATG CGGACGACTT 1680 

55 

CATTATCTCT GTTAAAGGAA GCAAAGAGGA CTGTCAATGG ATAAAAGAAC AATTAAAACT 1740 

TTTTATTCAT AACAAGCTAA AAATGGAATT GAGTGAAGAA AAAACACTCA TCACACATAG 1800 
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CAGTCAACCC GCTCGTTTTC TGGGATATGA TATACGAGTA AGGAGAAGTG GAACGATAAA 1860 

ACGATCTGGT AAAGTCAAAA AGAGAACACT CAATGGGAGT GTAGAACTCC TTATTCCTCT 1920 

5 

TCAAGACAAA ATTCGTCAAT TTATTTTTGA CAAGAAAATA GCTATCCAAA AGAAAGATAG 1980 

CTCATGGTTT CCAGTTCACA GGAAATATCT TATTCGTTCA ACAGACTTAG AAATCATCAC 2040 

10 AATTTATAAT TCTGAATTAA GAGGGATTTG TAATTACTAC GGTCTAGCAA GTAATTTTAA 2100 

CCAGCTCAAT TATTTTGCTT ATCTTATGGA ATACAGCTGT CTAAAAACGA TAGCCTCCAA 2160 

ACATAAGGGA ACACTTTCAA AAACCATTTC CATGTTTAAA GATGGAAGTG GTTCGTGGGG 2220 

15 

CATCCCGTAT GAGATAAAGC AAGGTAAGCA GCGCCGTTAT TTTGCAAATT TTAGTGAATG 2280 

TAAATCCCCT TATCAATTTA CGGATGAGAT AAGTCAAGCT CCTGTATTGT ATGGCTATGC 2340 

20 CCGGAATACT CTTGAAAACA GGTTAAAAGC TAAATGTTGT GAATTATGTG GAACATCTGA 2400 

TGAAAATACT TCCTATGAAA TTCACCATGT CAATAAGGTC AAAAATCTTA AAGGCAAAGA 2460 

AAAATGGGAA ATGGCAATGA TAGCGAAACA ACGTAAAACT CTTGTTGTAT GCTTTCATTG 2520 

25 

TCATCGTCAC GTGATTCATA AACACAAGTG AATTTTTACG AACGAACAAT AACAGAGCCG 2580 

TATACTCCGA GAGGGGTACG TACGGTTCCC GAAGAGGGTG GTGCAAACCA GTCACAGTAA 2640 

30 TGTGAACAAG GCGGTACCTC CCTACTTCAC CATATCATTT TTAATTCTAC GAATCTTTAT 2700 

ACTGGCAAAC AATTTGACTG GAAAGTCATT CCTAAAGAGA AAACAAAAAG CGGCAAAGCT 2760 

T 2761 
(2) INFORMATION FOR SEQ ID NO: 2: 



35 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1800 base pairs 
4 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



45 



50 



(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..1800 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 



ATG AAA CCA ACA ATG GCA ATT TTA GAA AGA ATC AGT AAA AAT TCA CAA 48 
55 Met Lys Pro Thr Met Ala lie Leu Glu Arg lie Ser Lys Asn Ser Gin 
15 10 15 

GAA AAT ATA GAC GAA GTT TTT ACA AGA CTT TAT CGT TAT CTT TTA CGT 96 
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Glu Asn lie Asp Glu Val Phe Thr Arg Leu Tyr Arg Tyr Leu Leu Arg 
20 25 30 

CCA GAT ATT TAT TAG GTG GCG TAT CAA AAT TTA TAT TCC AAT AAA GGA 144 
5 Pro Asp lie Tyr Tyr Val Ala Tyr Gin Asn Leu Tyr Ser Asn Lys Gly 
35 40 45 

GCT TCC ACA AAA GGA ATA TTA GAT GAT ACA GCG GAT GGC TTT AGT GAA 192 
Ala Ser Thr Lys Gly He Leu Asp Asp Thr Ala Asp Gly Phe Ser Glu 
10 50 55 60 

GAA AAA ATA AAA AAG ATT ATT CAA TCT TTA AAA GAC GGA ACT TAC TAT 240 
Glu Lys He Lys Lys He He Gin Ser Leu Lys Asp Gly Thr Tyr Tyr 
65 70 75 80 

15 

CCT CAA CCT GTA CGA AGA ATG TAT ATT GCA AAA AAG AAT TCT AAA AAG 288 
Pro Gin Pro Val Arg Arg Met Tyr He Ala Lys Lys Asn Ser Lys Lys 
85 90 95 

20 ATG AGA CCT TTA GGA ATT CCA ACT TTC ACA GAT AAA TTG ATC CAA GAA 336 
Met Arg Pro Leu Gly He Pro Thr Phe Thr Asp Lys Leu He Gin Glu 
100 105 110 

GCT GTG AGA ATA ATT CTT GAA TCT ATC TAT GAA CCG GTA TTC GAA GAT 384 
25 Ala Val Arg He He Leu Glu Ser He Tyr Glu Pro Val Phe Glu Asp 
115 120 125 

GTG TCT CAC GGT TTT AGA CCT CAA CGA AGC TGT CAC ACA GCT TTG AAA 432 
Val Ser His Gly Phe Arg Pro Gin Arg Ser Cys His Thr Ala Leu Lys 
30 130 135 140 

ACA ATC AAA AGA GAG TTT GGC GGC GCA AGA TGG TTT GTG GAG GGA GAT 480 
Thr He Lys Arg Glu Phe Gly Gly Ala Arg Trp Phe Val Glu Gly Asp 
145 150 155 160 

35 

ATA AAA GGC TGC TTC GAT AAT ATA GAC CAC GTT ACA CTC ATT GGA CTC S28 
He Lys Gly Cys Phe Asp Asn He Asp His Val Thr Leu He Gly Leu 
165 170 175 

4 0 ATC AAT CTT AAA ATC AAA GAT ATG AAA ATG AGC CAA TTG ATT TAT AAA 576 
He Asn Leu Lys He Lys Asp Met Lys Met Ser Gin Leu He Tyr Lys 
180 185 190 

TTT CTA AAA GCA GGT TAT CTG GAA AAC TGG CAG TAT CAC AAA ACT TAC 624 
45 Phe Leu Lys Ala Gly Tyr Leu Glu Asn Trp Gin Tyr His Lys Thr Tyr 
195 200 205 

AGC GGA ACA CCT CAA GGT GGA ATT CTA TCT CCT CTT TTG GCC AAC ATC 672 
Ser Gly Thr Pro Gin Gly Gly He Leu Ser Pro Leu Leu Ala Asn He 
50 210 215 220 

TAT CTT CAT GAA TTG GAT AAG TTT GTT TTA CAA CTC AAA ATG AAG TTT 720 
Tyr Leu His Glu Leu Asp Lys Phe Val Leu Gin Leu Lys Met Lys Phe 
225 230 235 240 



55 



GAC CGA GAA AGT CCA GAA AGA ATA ACA CCT GAA TAT CGG GAA CTT CAC 768 
Asp Arg Glu Ser Pro Glu Arg lie Thr Pro Glu Tyr Arg Glu Leu His 
245 250 255 
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AAT GAG ATA AAA AGA ATT TCT CAC CGT CTC AAG AAG TTG GAG GGT GAA 816 
Asn Glu lie Lys Arg lie Ser His Arg Leu Lys Lys Leu Glu Gly Glu 
260 265 270 

5 

GAA AAA GCT AAA GTT CTT TTA GAA TAT CAA GAA AAA CGT AAA AGA TTA 864 
Glu Lys Ala Lys Val Leu Leu Glu Tyr Gin Glu Lys Arg Lys Arg Leu 
275 280 285 

10 CCC ACA CTC CCC TGT ACC TCA CAG ACA AAT AAA GTA TTG AAA TAC GTC 912 
Pro Thr Leu Pro Cys Thr Ser Gin Thr Asn Lys Val Leu Lys Tyr Val 
290 295 300 

CGG TAT GCG GAC GAC TTC ATT ATC TCT GTT AAA GGA AGC AAA GAG GAC 960 
15 Arg Tyr Ala Asp Asp Phe lie lie Ser Val Lys Gly Ser Lys Glu Asp 
305 310 315 320 

TGT CAA TGG ATA AAA GAA CAA TTA AAA CTT TTT ATT CAT AAC AAG CTA 1008 
Cys Gin Trp lie Lys Glu Gin Leu Lys Leu Phe lie His Asn Lys Leu 
20 325 330 335 

AAA ATG GAA TTG AGT GAA GAA AAA ACA CTC ATC ACA CAT AGC AGT CAA 1056 
Lys Met Glu Leu Ser Glu Glu Lys Thr Leu lie Thr His Ser Ser Gin 
340 345 350 

25 

CCC GCT CGT TTT CTG GGA TAT GAT ATA CGA GTA AGG AGA AGT GGA ACG 1104 
Pro Ala Arg Phe Leu Gly Tyr Asp lie Arg Val Arg Arg Ser Gly Thr 
355 360 365 

30 ATA AAA CGA TCT GGT AAA GTC AAA AAG AGA ACA CTC AAT GGG AGT GTA 1152 
He Lys Arg Ser Gly Lys Val Lys Lys Arg Thr Leu Asn Gly Ser Val 
370 375 380 

GAA CTC CTT ATT CCT CTT CAA GAC AAA ATT CGT CAA TTT ATT TTT GAC 1200 
35 Glu Leu Leu He Pro Leu Gin Asp Lys He Arg Gin Phe He Phe Asp 
385 390 395 400 

AAG AAA ATA GCT ATC CAA AAG AAA GAT AGC TCA TGG TTT CCA GTT CAC 1248 
Lys Lys He Ala He Gin Lys Lys Asp Ser Ser Trp Phe Pro Val His 
40 405 410 415 

AGG AAA TAT CTT ATT CGT TCA ACA GAC TTA GAA ATC ATC ACA ATT TAT 1296 
Arg Lys Tyr Leu He Arg Ser Thr Asp Leu Glu He He Thr He Tyr 
420 425 430 

45 

AAT TCT GAA TTA AGA GGG ATT TGT AAT TAC TAC GGT CTA GCA AGT AAT 1344 
Asn Ser Glu Leu Arg Gly He Cys Asn Tyr Tyr Gly Leu Ala Ser Asn 
435 440 445 

50 TTT AAC CAG CTC AAT TAT TTT GCT TAT CTT ATG GAA TAC AGC TGT CTA 1392 
Phe Asn Gin Leu Asn Tyr Phe Ala Tyr Leu Met Glu Tyr Ser Cys Leu 
450 455 460 

AAA ACG ATA GCC TCC AAA CAT AAG GGA ACA CTT TCA AAA ACC ATT TCC 1440 
55 Lys Thr He Ala Ser Lys His Lys Gly Thr Leu Ser Lys Thr He Ser 
465 470 475 480 

ATG TTT AAA GAT GGA AGT GGT TCG TGG GGC ATC CCG TAT GAG ATA AAG 1488 
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Met Phe Lys Asp Gly Ser Gly Ser Trp Gly lie Pro Tyr Glu lie Lys 
485 490 495 

CAA GGT AAG CAG CGC CGT TAT TTT GCA AAT TTT AGT GAA TGT AAA TCC 1536 
5 Gin Gly Lys Gin Arg Arg Tyr Phe Ala Asn Phe Ser Glu Cys Lys Ser 
500 505 510 

CCT TAT CAA TTT ACG GAT GAG ATA AGT CAA GCT CCT GTA TTG TAT GGC 1584 
Pro Tyr Gin Phe Thr Asp Glu He Ser Gin Ala Pro Val Leu Tyr Gly 
10 515 520 525 

TAT GCC CGG AAT ACT CTT GAA AAC AGG TTA AAA GCT AAA TGT TGT GAA 1632 
Tyr Ala Arg Asn Thr Leu Glu Asn Arg Leu Lys Ala Lys Cys Cys Glu 
530 535 540 

15 

TTA TGT GGA ACA TCT GAT GAA AAT ACT TCC TAT GAA ATT CAC CAT GTC 1680 
Leu Cys Gly Thr Ser Asp Glu Asn Thr Ser Tyr Glu He His His Val 
545 550 555 560 

20 AAT AAG GTC AAA AAT CTT AAA GGC AAA GAA AAA TGG GAA ATG GCA ATG 1728 
Asn Lys Val Lys Asn Leu Lys Gly Lys Glu Lys Trp Glu Met Ala Met 
565 570 575 

ATA GCG AAA CAA CGT AAA ACT CTT GTT GTA TGC TTT CAT TGT CAT CGT 1776 
25 He Ala Lys Gin Arg Lys Thr Leu Val Val Cys Phe His Cys His Arg 
580 585 590 

CAC GTG ATT CAT AAA CAC AAG TGA 1800 
His Val He His Lys His Lys + 
30 595 600 



(2) INFORMATION FOR SEQ ID NO: 3: 

35 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 600 amino acids 

(B) TYPE: amino acid 
<D) TOPOLOGY: linear 

40 (ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

Met Lys Pro Thr Met Ala He Leu Glu Arg He Ser Lys Asn Ser Gin 
45 1 5 10 15 

Glu Asn He Asp Glu Val Phe Thr Arg Leu Tyr Arg Tyr Leu Leu Arg 
20 25 30 

50 Pro Asp He Tyr Tyr Val Ala Tyr Gin Asn Leu Tyr Ser Asn Lys Gly 
35 40 45 



55 



Ala Ser Thr Lys Gly He Leu Asp Asp Thr Ala Asp Gly Phe Ser Glu 
50 55 60 

Glu Lys He Lys Lys He He Gin Ser Leu Lys Asp Gly Thr Tyr Tyr 
65 70 75 80 
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Pro Gin Pro Val Arg Arg Met Tyr lie Ala Lys Lys Asn Ser Lys Lys 
85 90 95 

Met Arg Pro Leu Gly lie Pro Thr Phe Thr Asp Lys Leu lie Gin Glu 
5 100 105 110 

Ala Val Arg lie lie Leu Glu Ser lie Tyr Glu Pro Val Phe Glu Asp 
115 120 125 

10 Val Ser His Gly Phe Arg Pro Gin Arg Ser Cys His Thr Ala Leu Lys 
130 135 140 

Thr lie Lys Arg Glu Phe Gly Gly Ala Arg Trp Phe Val Glu Gly Asp 
145 150 155 160 

15 

lie Lys Gly Cys Phe Asp Asn lie Asp His Val Thr Leu lie Gly Leu 
165 170 175 

lie Asn Leu Lys lie Lys Asp Met Lys Met Ser Gin Leu lie Tyr Lys 
20 180 185 190 

Phe Leu Lys Ala Gly Tyr Leu Glu Asn Trp Gin Tyr His Lys Thr Tyr 
195 200 205 

25 Ser Gly Thr Pro Gin Gly Gly lie Leu Ser Pro Leu Leu Ala Asn lie 
210 215 220 

Tyr Leu His Glu Leu Asp Lys Phe Val Leu Gin Leu Lys Met Lys Phe 
225 230 235 240 

30 

Asp Arg Glu Ser Pro Glu Arg lie Thr Pro Glu Tyr Arg Glu Leu His 
245 250 255 

Asn Glu lie Lys Arg lie Ser His Arg Leu Lys Lys Leu Glu Gly Glu 
35 260 265 270 

Glu Lys Ala Lys Val Leu Leu Glu Tyr Gin Glu Lys Arg Lys Arg Leu 
275 280 285 

40 Pro Thr Leu Pro Cys Thr Ser Gin Thr Asn Lys Val Leu Lys Tyr Val 
290 295 300 

Arg Tyr Ala Asp Asp Phe lie lie Ser Val Lys Gly Ser Lys Glu Asp 
305 310 315 320 

45 

Cys Gin Trp lie Lys Glu Gin Leu Lys Leu Phe lie His Asn Lys Leu 
325 330 335 

Lys Met Glu Leu Ser Glu Glu Lys Thr Leu He Thr His Ser Ser Gin 
50 340 345 350 

Pro Ala Arg Phe Leu Gly Tyr Asp He Arg Val Arg Arg Ser Gly Thr 
355 360 365 

55 He Lys Arg Ser Gly Lys Val Lys Lys Arg Thr Leu Asn Gly Ser Val 
370 375 380 



Glu Leu Leu He Pro Leu Gin Asp Lys He Arg Gin Phe He Phe Asp 
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385 390 395 400 

Lys Lys lie Ala lie Gin Lys Lys Asp Ser Ser Trp Phe Pro Val His 
405 410 415 

5 

Arg Lys Tyr Leu lie Arg Ser Thr Asp Leu Glu lie lie Thr He Tyr 
420 425 430 

Asn Ser Glu Leu Arg Gly He Cys Asn Tyr Tyr Gly Leu Ala Ser Asn 
10 435 440 445 

Phe Asn Gin Leu Asn Tyr Phe Ala Tyr Leu Met Glu Tyr Ser Cys Leu 
450 455 460 

15 Lys Thr He Ala Ser Lys His Lys Gly Thr Leu Ser Lys Thr He Ser 
465 470 475 480 

Met Phe Lys Asp Gly Ser Gly Ser Trp Gly He Pro Tyr Glu lie Lys 
485 490 495 

20 

Gin Gly Lys Gin Arg Arg Tyr Phe Ala Asn Phe Ser Glu Cys Lys Ser 
500 505 510 

Pro Tyr Gin Phe Thr Asp Glu He Ser Gin Ala Pro Val Leu Tyr Gly 
25 515 520 525 

Tyr Ala Arg Asn Thr Leu Glu Asn Arg Leu Lys Ala Lys Cys Cys Glu 
530 535 540 

30 Leu Cys Gly Thr Ser Asp Glu Asn Thr Ser Tyr Glu He His His Val 
545 550 555 560 

Asn Lys Val Lys Asn Leu Lys Gly Lys Glu Lys Trp Glu Met Ala Met 
565 570 575 

35 

He Ala Lys Gin Arg Lys Thr Leu Val Val Cys Phe His Cys His Arg 
580 585 590 

His Val He His Lys His Lys * 
40 595 600 

(2) INFORMATION FOR SEQ ID NO:4: 

<i) SEQUENCE CHARACTERISTICS: 
45 (A) LENGTH: 129 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

50 (ii) MOLECULE TYPE: DNA (genomic) 



55 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 

CGCTCTAGAA CTAGTGGATC CTTGCAACCC ACGTCGATCG TGAACACATC CATAACCATA 
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TCATTTTTAA TTCTACGAAT CTTTATACTG GGAATTCGAT ATCAAGCTTA TCGATACCGT 120 
CGACCTCGA 12 9 
5 (2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 
10 (C) STRANDEDNESS : single 

{D) TOPOLOGY: linear 



15 



55 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

20 TCTACCTCAT CTAGACATTT TCTCC ' 25 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

30 (ii) MOLECULE TYPE: DNA (genomic) 

35 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: 

CGTTCGTAAA GCTAGCCTTG TGTTTATG 28 
(2) INFORMATION FOR SEQ ID NO: 7: 

40 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
45 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

50 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
CACAAAGTGA TCATTTAACG AACG 24 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

5 

(ii) MOLECULE TYPE : DNA (genomic) 



10 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: 
TTGGGATCCT CATAAGCTTT GCCGC 25 
15 (2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 
20 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

30 CAAAGGATCC GATGAAACCA ACAATGGCAA 30 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

40 (ii) MOLECULE TYPE: DNA (genomic) 



45 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:10: 

AGTGGCTTCC ATATGCTTGG TCATCACCTC ATC 33 
(2) INFORMATION FOR SEQ ID NO: 11: 

50 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
55 (D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
GGTAGAACCA TATGAAATTC CTCCTCCCTA ATCAATTTT 39 
(2) INFORMATION FOR SEQ ID NO: 12: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 
<C> STRANDEDNESS: single 
(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
AAAACCTCCA TATGAAACCA ACAATG 
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What is Claimed is: 

1. A method for preparing a nucleotide integrase which 
cleaves a double- stranded DNA substrate, said method comprising the 
following steps: 

{a) providing a DNA molecule comprising a group II intron DNA 
sequence, wherein the group II intron DNA sequence encodes a group II 
intron RNA and comprises an open reading frame sequence which encodes a 
group II intron- encoded protein; 

(b) introducing the DNA molecule into a host cell; 

(c) expressing the group II intron DNA sequence in the host 
cell, to provide an excised group II intron RNA and a group II intron- 
encoded protein molecule, wherein the protein and the RNA combine in the 
host cell to form the nucleotide integrase; 

(d) obtaining the nucleotide integrase of step (c) from the 

host cell. 

2. The method of claim 1 wherein the DNA molecule further 
comprises a promoter operably linked to the group II intron DNA sequence. 

3. The method of claim 2 wherein the promoter is an 
inducible promoter. 

4. The method of claim 2 wherein the DNA molecule further 
comprises a nucleotide sequence which encodes a tag for facilitating 
isolation of the nucleotide integrase from the host cell; and 

wherein the method further comprises expressing the nucleotide 
sequence which encodes the tag in the host cell to provide a tagged group 
II intron-encoded protein; and 

wherein step (d) involves employing the tag to recover the 
nucleotide integrase. 

5 . The method of claim 4 wherein the sequence which encodes 
the tag is located at the 51 end or the 3 ' end of the open reading frame 

3 0 sequence of the group II intron DNA sequence. 

6 . The method of claim 2 further comprising the steps of : 
introducing a DNA sequence encoding at TRNA which corresponds 

to the genetic code of the group II intron DNA sequence into the host cell 
before step (b) and 
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expressing the tRNA- encoding DNA sequence in the host cell. 

7 . The method of claim 1 wherein the DNA molecule is 
prepared by the following steps of: 

preparing a synthetic group II intron DNA sequence; wherein the 
5 group II intron DNA sequence comprises a sequence of nucleotides that bind 
to the recognition site of the substrate DNA and 

incorporating the synthetic group II intron DNA sequence into 
the DNA molecule. 

8 . The method of claim 1 wherein the group II intron DNA 
10 sequence comprises the DNA sequence of the Ll.ltrB intron and the RNP 

particles comprise an excised Ll.ltrB intron RNA and an ltra protein. 

9. The method of claim l wherein the group II intron DNA 
sequence comprises a modified DNA sequence of the Ll.ltrB intron and the 
RNP particles comprise a modified excised Ll.ltrB intron RNA and an ltra 

15 protein molecule. 

10. The method of claim 1 wherein the group II intron DNA 
sequence comprises a modified DNA sequence of the Ll.ltrB intron and the 
RNP particles comprise a modified excised Ll.ltrB intron RNA and a modified 
ltra protein molecule. 

2 0 11. The method of claim 1 wherein the host cell is E. coli. 

12. The method of claim 8 wherein the host cell is E. coli. 

13 . A method of preparing a nucleotide integrase in vitro 
comprising the steps of: 

(a) providing an isolated, excised, group II intron RNA; 
25 (b) providing an isolated group II intron- encoded protein; 

and 

(c) incubating the excised, group II intron RNA with the 
group II intron -encoded protein for a sufficient time to form a nucleotide 
integrase comprising the excised, group II intron RNA bound to the group 
30 II, intron- encoded protein. 

14. The method of claim 13 wherein the group II intronencoded 
protein is produced by a process comprising the steps of : 
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(a) providing a DNA molecule comprising an open reading frame 
sequence of a group II intron, said open reading frame sequence being 
operably linked to a promoter; 

(b) introducing the DNA molecule of step (a) into a host 

5 cell; 

(c) expressing the open reading frame sequence in the host 
cell to provide the group II intron- encoded protein; and (d) isolating the 
group II intron -encoded protein from the host cell. 

15. The method of claim 13 wherein the DNA molecule further 
10 comprises a sequence which encodes a tag that facilitates isolation of the 

group II intron-encoded protein from the host cell; and 

wherein the method further comprises expressing the nucleotide 
sequence which encodes the tag in the host cell to provide a tagged group 
II intron-encoded protein; and 
15 wherein step (d) involves obtaining a tagged nucleotide 

integrase from the host cell. 

16 . The method of claim 15 wherein the sequence which encodes 
the tag is located at a position selected from the 51 end and the 31 end of 
the open reading frame sequence. 

20 17. The method of claim 13 wherein the open reading frame 

sequence encodes the ltra protein and wherein the excised, group II RNA 
is elected from the group consisting of an unmodified, excised Ll.ltrB 
intron RNA and a modified, excised Ll.ltrB intron RNA. 

18. A method of preparing a nucleotide integrase in vitro 
25 comprising the steps of: 

(a) providing an exogenous RNA which comprises an excised 
group II intron RNA; 

(b) providing an RNA-protein complex, wherein the RNA-protein 
complex comprises a protein having an amino acid sequence encoded by a 

30 group II intron and RNA that is free of excised, group II intron RNA 
molecules having a sequence that encodes said protein; said RNA-protein 
complex being prepared by the following steps : 

(i) providing an isolated DNA molecule comprising a 
group II intron DNA sequence, wherein said group II intron DNA sequence 
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encodes a group II intron- encoded protein and a splicing defective group II 
RNA; 

(ii) introducing the DNA molecule into a host cell; 

(iii) expressing the mutated group II intron DNA sequence 
in the host cell, wherein an RNA-protein complex comprising the group II 
intron -encoded protein and the spl icing-defective group II RNA are formed 
in the cell 

(iv) obtaining the RNA-protein complex of step (iii) 
from the host cell; and 

(c) incubating the exogenous RNA of step (a) with the RNP 
particle preparation for a sufficient time to form a nucleotide integrase 
comprising the excised group II RNA and the protein having an amino acid 
sequence encoded by a group II intron. 

19. A nucleotide integrase prepared according to a method 
selected from the group consisting of the method of claim l f the method of 
claim 13 and the method of claim 18. 

20 . An isolated nucleotide integrase comprising an excised 
Ll.ltrB intron RNA and an Itra protein molecule. 
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Fig. 2 



XO 20 30 40 50 €0 70 80 90 

aagCttAGAG AAAAATAATG CGGTGCTTGG TCATCACCTC ATCCAATCAT TTTCTCCTGA TGACAATCTA ACTCCTGAAC AAATTCATGA 

100 110 120 130 140 150 160 170 180 

AATAGGTCGT CAAACCATAT TAGAATTTAC AGGTGGCGAA TATGAATTTG TGATTGCAAC CCACGTCGAT CGTGAACACA TCCATAAcjpff 

190 200 210 220 230 240 250 260 270 

GCGCCCAGAT AGGGTGTTAA GTCAAGTAGT TTAAGGTACT ACTCTGTAAG ATAACACAGA AAAC AG CCAA CCTAACCGAA AAGCGAAAGC 

280 290 300 310 320 330 340 350 360 

TGATACGGGA ACAGAGCACG GTTGGAAAGC GATGAGTTAC CTAAAGACAA TCGGGTACGA CTGAGTCGCA ATGTTAATCA GATATAAGGT 

370 380 390 400 410 420 430 440 450 

ATAAGTTGTG TTTACTGAAC GCAAGTTTCT AATTTCGGTT ATGTGTCGAT AGAGGAAAGT GTCTGAAACC TCTAGTACAA AGAAAGGTAA 

460 470 480 490 500 510 520 530 540 

GTTATGGTTG TGGACTTATC TGTTATCACC ACATTTGTAC AATCTGTAGG AGAACCTATG GGAACGAAAC GAAAGCGATG CCGAGAATCT 

550 560 570 580 590 600 610 620 630 

GAATTTACCA AGACTTAACA CTAACTGGGG ATACCCTAAA CAAGAATGCC TAATAGAAAG GAGGAAAAAG GCTATAGCAC TAGAGCTTGA 

640 650 660 670 680 690 700 710 720 

AAATCTTGCA AGGGTACGGA GTACTCGTAG TAGTCTGAGA AGGGTAACGC CCTTTACATG GCAAAGGGGT ACAGTTATTG TGTACTAAAA 

730 740 750 760 770 780 790 800 810 

TTAAAAATTG ATTAGGGAGG AAAACCTCAA AATGAAACCA ACAATGGCAA TTTTAGAAAG AATCAGTAAA AATTCACAAG AAAATATAGA 

MKP TMAI LER ISK USQE NID 

820 830 840 850 860 870 880 890 900 

CGAAGTTTTT ACAAGACTTT ATCGTTATCT TTTACGTCCA GATATTTATT ACGTGGCGTA TCAAAATTTA 7ATTCCAATA AAGGAGCTTC 
EVP T R L Y RYL LP. P DIYY V A Y QNL YSNK GAS 

910 920 930 940 950 960 970 980 990 

CACAAAAGGA ATATTAGATG ATACAGCGGA TGGCTTTAGT GAAGAAAAAA TAAAAAAGAT TATTCAATCT TTAAAAGACG GAACTTACTA 
TKG ILDD TAD GFS EEKI K K I XQS LKDG TYY 

1000 1030 1020 1030 1040 1050 1060 1070 1080 

TCCTCAACCT GTACGAAGAA TGTATATTGC AAAAAAGAAT TCTAAAAAGA TGAGACCTTT AGGAATTCCA ACTTTCACAG ATAAATTGAT 
PQP VRRM Y I A KKN SKKM RPL GIP TFTD KLI 

1090 1100 1110 1120 1130 1140 1150 1160 1170 

CCAAGAAGCT GTGAGAATAA TTCTTGAATC TATCTATGAA CCGGTATTCG AAGATGTGTC TCACCGTTTT AGACCTCAAC GAAGCTGTCA 
Q E A VRII LES IYE PVFE DVS HGF RPQR SCH 

1180 1190 1200 1210 1220 1230 1240 1250 1260 

CACAGCTTTG AAAACAATCA AAAGAGAGTT TGGCGGCGCA AGATGGTTTG TGGAGGGAGA TATAAAAGGC TCCTTCCATA ATATAGACCA 
TAL KTylK REF GGA RWFV EGD IKG CFDN IDH 

1270 1280 1290 1300 1310 1320 1330 1340 1350 

CGTTACACTC ATTGGACTCA TCAATCTTAA AATCAAAGAT ATGAAAATGA GCCAATTGAT TTATAAATTT CTAAAAGCAG GTTATCTGGA 
VTL I G L I NLK I K D MKMS Q L I YKF LKAG YLE 

1360 " 1370 1380 1390 1400 1410 1420 1430 1440 

AAACTGGCAG TATCACAAAA CTTACAGCGG AACACCTCAA GGTCGAATTC TATCTCCTCT TTTGGCCAAC ATCTATCTTC ATGAATTGCA 
N W Q Y H K T V S C T P 0 C G I L S P L L A M I Y L H E L D 



1450 MCO M70 14BO 1490 1500 1510 1520 1530 

TAAGTTTCTT TTACAACTCA AAATCAACTT TCACCCACAA AGTCCAGAAA GAATAACACC TGAATATCCG CAACTTCACA ATG AG ATAAA 

k F v l o i . s: m >: :-* d r e s p e p. i t p f. y p. h: t , n \i e i k 
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Fig. 2 
(Cont.) 

1540 1550 1560 1570 1580 1590 1600 1610 1620 

AAGAATTTCT CACCGTCTCA AGAAGTTGGA GGGTGAAGAA AAAGCTAAAG TTCTTTTAGA ATATCAAGAA AAACGTAAAA GATTACCCAC 
RIS HRLK K L E GEE KAKV LLE YQE K R K R LPT 

1630 1640 1650 1660 1670 1680 1690 1700 1710 

ACTCCCCTGT ACCTCACAGA CAAATAAAGT ATTGAAATAC GTCCGGTATG CGGACGACTT CATTATCTCT GTTAAAGGAA GCAAAGAGGA 
LPC TSQT NKV LKY V R Y A DDF IIS VKGS KED 

1720 1730 1740 1750 1760 1770 1780 1790 1800 

CTGTCAATGG ATAAAAGAAC AATTAAAACT TTTTATTCAT AACAAGCTAA AAATGGAATT GAGTGAAGAA AAAACACTCA TCACACATAG 
CQW IKEQ LKL FIH NKLK MEL SEE KTLI THS 

1810 1820 1830 1840 1850 I860 1870 1880 1890 

CAGTCAACCC GCTCGTTTTC TGGGATATGA TATACGAGTA AGGAGAAGTG GAACGATAAA ACGATCTGGT AAAGTCAAAA AGAGAACACT 
SQP ARFL G Y D IRV RRSG TIK RSG KVKK RTL 

1900 1910 1920 1930 1940 1950 1960 1970 1980 

CAATGGGAGT GTAGAACTCC TTATTCCTCT TCAAGACAAA ATTCGTCAAT TTATTTTTGA CAAGAAAATA GCTATCCAAA AGAAAGATAG 
NGS VELL IPL QDK IRQF I F D KKI AIQK KDS 

1990 2000 2010 2020 2030 2040 2050 2060 2070 

CTCATGGTTT CCAGTTCACA GGAAATATCT TATTCGTTCA ACAGACTTAG AAATCATCAC AATTTATAAT TCTGAATTAA GAGGGATTTG 
SWF PVHR KYL IRS TDLE I I T I Y N SELR GIC 

2080 2090 2100 2110 2120 2130 2140 2150 2160 

TAATTACTAC GGTCTAGCAA GTAATTTTAA CCAGCTCAAT TATTTTGCTT ATCTTATGGA ATACAGCTGT CTAAAAACGA TAGCCTCCAA 
NYY G L A S NFN QLN YFAY LME YSC LKTI ASK 

2170 2180 2190 2200 2210 2220 2230 2240 2250 

ACATAAGGGA ACACTTTCAA AAACCATTTC CATGTTTAAA GATGGAAGTG GTTCGTGGGG CATCCCGTAT GAGATAAAGC AAGGTAAGCA 
HKG TLSK TI S MFK DGSG SWG I P Y EIKQ GKQ 

2260 2270 2280 2290 2300 231C 2320 2330 2340 

GCGCCGTTAT TTTGCAAATT TTAGTGAATC TAAATCCCC? TATCAATTTA CGGATGAGAT AAGTCAAGCT CCTGTATTGT ATGGCTATCC 
RRY FANF SEC KSP YQFT DEI SQA PVLY GYA 

2350 2360 2370 2380 2390 2400 2410 2420 2430 

CCGGAATACT CTTGAAAACA GGTTAAAAGC TAAATGTTGT GAATTATGTG GAACATCTGA TGAAAATACT TCCTATCAAA TTCACCA7GT 
R N T LENR LKA KCC ELCG TSD ENT SYEI HHV 

2440 2450 2460 2470 2480 2490 2500 2510 2520 

CAATAAGGTC AAAAATCTTA AAGGCAAAGA AAAATGGGAA ATGGCAATGA TAGCGAAACA ACGTAAAACT CTTGTTG TAT GCTTTCATTG 
NKV KNLK GKE KWE M A M I AKQ RKT LVVC FHC 

2530 2540 2550 2560 2570 2580 2590 2600 2610 

TCATCGTCAC GTGATTCATA AACACAAGTG AATTTTTACC AACGAACAAT AACAGAGCCG TATACTCCGA GAGGGGTACG TACGGTTCCC 
HRH VIHK H K * 



2620 2630 2640 2650 2660 2670 

GAAGAGGGTG GTCCAAACCA GTCACAGTAA TGTGAACAAG GCGGTACCTC CCTACTTCAC 



2680 2690 2700 
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Fig. 5 
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Fig. 6 
DNA Substrate 
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